AD  5506 1 9 


XS  61 


ON  THE  ASYMPTOTIC  DIRECTIONS  OF  THE 
S-DIMENSIONAL  OPTIMUM  GRADIENT  METHOD 

BY 

GEORGE  E.  FORSYTHE 


TECHNICAL  REPORT  NO.  CS  61 
APRIL  13,  1967 


ipr-v-rvi  npj 

APR  2  6  1967  ! 
U  til 


COMPUTER  SCIENCE  DEPARTMENT 
^School  of  Humanities  and  Sciences 
STANFORD  UNIVERSITY 


mam  @®Ftf 
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s -dimensional  Optimum  Gradient  Method* 

By 

George  E.  Forsythe 

Abstract 

The  optimum  s-gradient  method  for  minimizing  a 
positive  definite  quadratic  function  f(x)  on 
has  long  been  known  to  converge  for  s  >  1  .  For 
these  £  the  author  studies  the  directions  from 
which  the  iterates  approach  their  limit,  and 
extends  to  s  >  1  a  theory  proved  by  Aka  ike  for 
s  =  1  .  It  is  shown  that  f(x. )  can  never  converge 

K 

to  its  minimum  value  faster  than  linearly,  except  in 
degenerate  cases  where  it  attains  the  minimum  in  one 
step. 
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1.  Introduction  and  Summary. 


To  minimize  a  smooth  real-valued  function  f(x)  of  n  real  vari¬ 
ables,  the  optimum  s-gradient  method  has  been  described  by  Birman  [j], 

Faddeev  and  Faddeeva  [5],  Khabaza  [8],  and  others.  We  here  consider  the 

T 

model  function  f(x)  *=  Jx  Ax,  where  A  is  a  positive  definite  matrix. 
Then  each  iterate  x^  is  equal  to  its  error.  The  convergence  of  the 
method  was  proved  long  ago--see  (2.14) --and  the  question  now  under  study 
is  to  find  the  asymptotic  manner  in  which  the  iterates  x^  -»  0,  the  null 
vector. 

For  s  =  1  it  was  conjectured  by  Forsythe  and  Motzkin  [7]  and 
proved  by  Akaike  [l]--see  (4.12) — that  the  iterates  x^  converge  to  0 
by  asymptotically  alternating  between  two  directions --the  "cage"  of 
Stiefel  L 10] .  Thus  the  convergence  of  f(x^)  to  0  for  s  =  1  is 
known  to  be  linear,  and  no  faster  than  linear,  for  any  start  xQ  that 
is  not  on  eigenvector.  Moreover,  if  coordinates  are  chosen  so  that  A 
is  a  diagonal  matrix,  then  the  two  asymptotic  directions  have  only  two 
nonzero  components.  Finally,  any  direction  with  only  two  nonzero  com¬ 
ponents  is  invariant  under  two  steps  of  the  optimum  1-gradient  method. 

In  the  present  paper  the  author  has  extended  most  of  the  known 
results  to  arbitrary  s  >  1  .  The  main  result  (5.8)  shows  that  the 
directions  of  the  even  iterates  x ^  have  as  s  limit  set  continuum 
R  (which  might  be  a  single  direction) .  Moreover,  each  direction  of  R 
is  invariant  under  two  st»ps  of  the  optimum  s-gradient  method.  Let  A 
be  a  diagonal  matrix.  It  is  then  shown  in  (> - 10)  that  in  the  optimum 
s-gradient  process  f(x^)  converges  to  0  no  faster  than  linearly  for 
any  initial  vector  xQ  with  at  least  s  +  1  nonzero  components. 
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Theorem  (4.7)  shows  that  all  vectors  of  R  have  between  s  +  1  and  2s 
nonzero  coordinates,  inclusive.  Theorem  (4.8)  says  that  any  direction 
with  s  +  1  nonzero  components  is  invariant  under  two  steps  of  the  method, 
for  any  s  .  Examples  are  shown  in  Sec.  4  of  directions  with  this  invari¬ 
ance  and  with  as  many  as  2s  nonzero  components. 

Experimental  evidence  from  computer  runs  for  s  =  2  suggests 
strongly  that  R  is  always  a  single  point,  just  it  has  been  proved  to 
be  for  s  =  1  .  The  author  conjectures  without  proof  that  R  is  a 
single  point  for  all  s,  so  that  x^  -»  6  in  an  alternating  manner 
completely  analogous  to  the  case  with  s  =  1  . 

The  author  is  aware  that  for  minimizing  quadratic  functions  f(x) 
in  practice,  the  conjugate-gradient  method  of  Hestenes  and  Stiefel  (see 
[5])  may  usually  be  expected  to  be  superior  to  the  optimum  s-gradient 
methods,  although  Khabaza  [8]  claims  superiority  for  the  3 -gradient 
method  in  some  cases.  For  nonquadratic  functions  f(x)  the  relative 
merits  of  the  methods  are  less  clear.  The  purpose  of  the  present  inves¬ 
tigation  was  the  intellectual  one  of  trying  to  understand  the  asymptotic 
behavior  of  the  various  gradient  methods  for  quadratic  functions.  The 
author  expects  that  this  information  may  have  some  useful  application 
to  the  minimization  of  general  smooth  functions  f(x)  . 
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2.  The  Optimum  s-gradlent  Method  for  Quadratic  Functions. 


Let  f(x)  be  real  for  all  x  in  real  euclidean  n-sp8ce  En  .  Let 
f(x)  assume  a  minimum  value  for  a  unique  x,  which  can  be  taken  as  6, 
the  origin  of  Er,  without  loss  of  generality  in  the  analysis.  The 
advantage  of  using  9  is  that  the  iterate  x^  is  then  also  its  own 
error  x.  -  6  as  a  minimizing  vector.  We  wish  to  analyze  certain 

Iv 

asymptotic  properties  of  a  class  of  optimum  gradient  methods  for  finding 
the  minimum  of  f(x)  . 

The  simplest  f  to  analyze  is  the  quadratic  function 
(2.1)  f(x)  =  ^xTAx, 

where  A  is  a  symmetric,  positive  definite,  nonderogatory  matrix  of 
order  n  .  Moreover,  (2.l)  represents  the  local  behavior  at  9  of 
f(x)  -  f(6)  for  most  sufficiently  smooth  functions  f  .  The  author 
conjectures  that  the  theorems  proved  below  for  a  quadratic  function 
apply  essentially  also  to  any  sufficiently  smooth  function  f  which 
is  locally  like  (2.1).  In  this  paper  only  quadratic  functions  will  be 
studied.  See  Daniel  [4]  for  an  investigation  comparing  gradient  methods 
for  quadratic  and  nonquadratic  functions  in  Hilbert  space. 

In  the  various  gradient  methods  one  starts  with  an  arbitrary 
vector  xQ,  and  computes  a  sequence  {x^}  converging  to  9  .  We 
assume  all  arithmetic  to  be  exact,  and  round-off  error  is  not  considered 
in  this  paper. 

Let  z^  =  grad  f(x^)  =  Ax^  denote  the  gradient  of  f  at  . 

In  the  optimum  1-gradient  method  [5],  xk+1  is  taken  to  be  the  unique 
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point  on  the  line  =  {x^  +  a  Ax^  :  -  •  <  a  <  *}  for  which  F(a)  - 
f(x.  +  aAx  )  is  a  minimum.  (The  existence  and  uniqueness  of  x.  . 

K  K  K  *  X 

result  from  the  easily  proved  fact  that  F(a)  is  a  quadratic  function 
of  a  with  F"(a)  >  0  .)  The  line  through  x^  is  called  the 

line  of  steepest  descent  of  f (x)  at  x^  . 

2 

For  x  €  L^,  grad  f(x)  =  A(x^  +  a  Ax^)  K  Ax^  +  a  A  x^  .  We  there¬ 
fore  consider  the  2 -dimensional  plane  through  x^> 

L2  =  ^xk  +  “l^k  +  a2A^xk  :  <*,-*<  a2  <  *}, 

and  call  it  the  2 -plane  of  steepest  descent  of  f (x)  at  x^  . 

By  extension,  for  any  integer  s  (l  <  s  <  n)  let 

s  . 

Ls  =  (xk  +  E  ai  A  \  :  -  •  ^  cri  <  •  (all  i  )} 

be  the  s -dimensional  plane  of  steepest  descent  of  f(x)  at  x^  .  Since 

A  is  not  derogatory,  Ax^,  . ..,  Anx^  are  linearly  independent,  provided 

xQ  is  a  vector  whose  minimum  polynomial  is  of  degree  n  .  In  that  case 

L  is  the  whole  space  E  . 
n  n 

In  the  optimum  s-gradient  method  [5]  for  minimizing  the  quadratic 

function  f  of  (2 . l) ,  the  point  x^+1  is  defined  to  be  the  unique  point 

y  in  L  for  which  f(y)  is  a  minimum  (k  =  0,  1,  ...)  .  (Again 
s 

existence  and  uniqueness  follow  from  the  positive  definiteness  of  A  .) 

It  is  the  optimum  s-gradient  methods  that  we  shall  analyze  in  this  paper. 

We  now  give  two  representations  of  the  minimizing  which  are 

useful  in  the  analysis.  Actual  computing  algorithms  for  the  optimum 


L  .  4-  »  j* 

r  \ 
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s-gradient  method  often  proceed  differently,  and  find  x^  by  taking 
s  steps  of  the  conjugate  gradient  method,  starting  from  .  See  [5]. 
We  concentrate  on  the  gradients  z^  =  Ax^  . 


First  representation 
Let 


Vl  =  xk  +  h^k  +  •••  +  \A\  ■ 


Then  the  gradient  of  f(x)  at  x  ^  is 


(2.2) 


z.  =  z,  +  7.Az,  +  ...  +  7  A  z.  . 
k+1  k  '1  k  's  k 


Since  x^  minimizes  f(y)  for  y  €  Lg,  the  vector  z^  must  be 
orthogonal  to  Lg  .  For  this  it  is  necessary  and  sufficient  that  z^+1 

S  “1 

be  orthogonal  to  zfc,  Az^,  ...,  A  z^  .  Then  7^,  ...,  7g  are  deter¬ 
mined  by  the  s  conditions 


(V  Vi>  *  (V  zk}  +  ri(V  Azk5  +  "•  *  ?s(V  A\> =  0 


(AS’lzk,  zk+1)  =  (AS_1zki  ZR)  +  ?i(*L"1zk.  Azk)  +  ...  +  7s(A“"Jzk,  A°zk)  =  0 


,s-l 


,  s-1 


s-1  .s 


T  T 

Here  (u,  v)  =  u  v  +  v  u  denotes  the  inner  product  of  two  column  vectors. 
Since  (APz,  A^z)  =  (z,  A^S:)  =  zT.,pfclz,  we  may  write  the  above  equations 


as 
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Zk  \  +  71  \  Azk  *  '  *  *  +  vs  zk  A  zk 


Ta  .  T.2  A  T.  s+l  . 

z,  Az.  +  y,  z,  A  z,  +  ...  +  y  z,  A  z,  =  0 
k  k  '1  k  k  's  k  k 


(2.3) 


T.s-1  „  T.s  .  .  T.2s-1 

z,  A  z,  +  y.  z,  A  z,  +  . . .  +  y  z,  A  z.  *»  0 
k  k  '1  k  k  'a  k  k 


As  long  as  z^,  Az^,  . ..,  A  z^  are  linearly  independent,  the 
equations  (2.3)  determine  the  minimizing  y^,  . ..,  y  uniquely. 


Second  representation 


Let  q  (t)  =  ts  +  p  .ts  +  ...  +  p.  denote  any  monic  polynomial 
s  s-x  u 


of  degree  s,  with  0^  ^  0  .  Then 


qs(AHk  -  A\  +  IS,./"  Ak+  ...  +  Vv 


(2.M 


1  .6  @S-1  .8-1 

-i^or =  r0  A  + A  zk+---  +  zk- 


Comparing  (2.1+)  with  (2.2),  we  see  that  we  can  write 


(2.5) 


P„(A) 

Zk+1  =  pToT  Zk* 
s 


where  p  (t)  is  the  particular  polynomial 
s 


(2.6) 


/ ,  \  ,  S  ,  _S-1  ,S-1  .  ,  x  ,  ,  x 

rs  '  y  7  y 

's  's  's 


rl  .  .  1 


Now  p  (t)  is  a  certain  orthogonal  polynomial.  Without  loss  of 
s 

generality  assume  A  to  be  the  diagonal  matrix 
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where  0  <  \^  <  <  . ..  <  A  are  its  eigenvalues  (distinct  because  A 

is  not  derogatory) . 


(2.8)  Definition.  In  the  coordinate  system  corresponding  to  (2.7),  let 
the  nonzero  vector  z  be  (£^,  . ..,  £  )  .  Let  orthogonality  of  two 
polynomials  p(t),  q(t)  (relative  to  z  )  be  defined  by 

o 

n 

<p(t),  q(t) )  »=  Y.  p(^1)q(X1)C7  =0  . 

ML 

g 

(2.9)  Definition.  Let  P  (t;  z)  «=  t  +  ...  be  the  unique  monic  poly- 

s 

nomial  of  degree  s  that,  relative  to  z,  is  orthogonal  in  the  sense 

of  (2.8)  to  all  polynomills  of  degree  <  s-1  . 

Note  that  P  (t;  z)  depends  only  on  the  direction  of  z,  and  not 
s 

its  magnitude.  I.e.,  P  (t;  z)  =  P  (t;  az),  for  all  real  a  /  0  . 

8  8 

(2.10)  Theorem.  The  polynomial  p_(t)  of  (2.5).  (2.6)  is  the  ortho- 
gonal  polynomial  P  (t;  z.)  defined  in  (2.9). 

We  shall  not  prove  (2.10).  For  a  related  proof  see,  for  example, 
p.  3^9  of  [5].  The  basic  reason  for  (2.10)  is  the  isomorphism,  well 
expounded  by  Stiefel  [11],  between  orthogonality  of  the  polynomials 


7 


p(t),  q(t)  in  the  sense  of  (2.8)  and  geometric  orthogonality  of  the 
vectors  p(A)z,  q(A)z  in  E  .  That  is, 


<p(t),  q(t)>  b  (p(A)z,  q(A)z)  . 


Hence  the  conditions  (2.3)  asserting  the  orthogonality  of  the  vector 

z.  b  P  (A;  z. )  z.  /  P  (0;  z. )  to  i, ,  Az, ,  A2z, ,  ....  A8"^,  are 
k+1  s k'  k  '  s  ’  Y.  k*  k*  k'  ’  k 

equivalently  asserting  the  orthogonality  of  the  polynomial  Pg(t;  z^) 

2  s-1 

to  the  polynomials  1,  t,  t  ,  t 

In  summary  z^+J  is  uniquely  determined  from  z^  by  the  formula 


(2.11) 


Jkfl 


Ps(0;  zk)  ' 


Moreover, 

(2.12) 


Ps(A;  Zk} 

Vl  =  Ps(0;  zk)  xk  ‘ 


Relation  (2.12)  is  the  basis  for  a  proof  by  Birman  [3]  that  in  the 


optimum  s-gradient  method  f(x^)  converges  to  0  linearly,  or  faster. 


-1 


Let  T  (t)  denote  the 
s 


To  be  precise,  let  a  =  (X^  +  X^)  (Xn  -  X^) 

Chebyshev  polynomial  on  [-1,  1],  normalized  so  that  max_^  <  t  <  l^Ts' 
1  .  Let 


.  T(l 


+  *1  -  2u' 


X  -  X, 
n  1 


Then  Q  (0)  =  T  (a)  >  1,  and  !q  (t)|  <  1,  for  X,  <  t  <  X  .  It  is 
s  s  s  "  x  —  —  n 
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known  that 


i. 


L 

L 

l 

I 

fl 

0 

0 

u 


Ts(o) 


(o  *Vo2-l)  ♦  (o  -Vc^l)  >  1 


Birman's  proof  goes  as  follows: 


(2.13) 


,  x  /*.( AJ  *k) 

f(W  "  f[ps(o;  zk)  x* 


/Qb(a)  \ 

<  f  Q-^y  XJ  f  because  Pg(t;  zR)  is  the  poly- 
\  b  /  nomial  that  minimizes  f(xk+1) 


- - 5T  X  V  (A)  A  o  (A)  X 

[a  (0)]2  k  6  S  k 

D 

1  2  i  [51(k)]2 

[Q.  (o)]2  ft  1  E  1  1 


t  *1K1W)a 

[«s(o)r  i-i  1  1 


f(xj  . 


[T,(o)]‘ 


Hence 


(2.14) 


/fCT  <  — r/fCT  > 

k  [T  (c)]k  0 

s 


proving  the  convergence  of  f(x,  )  to  0  to  be  linear  or  faster. 


(2.15)  Definition.  For  a  =  0,  +  1,  +  2,  ...,  let  the  moments 

of  z  =  (5^  ...,  ^n)T  be  defined  by 
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(2.16)  Theorem. 
gonal  polynomial 


(2.17) 


2 

i  • 


Fix  s  >  1  .  Except  for  a  constant  factor,  the  ortho- 
P_(t;  z)  of  (2.9)  can  be  expressed  by  the  determinant 

S  ' ri  1  ' 


Pg(t;  z) 


^1 
^1  ^2 


*•  Vl  1 


^s  ^s+1 


^S-l  1 


The  proof  is  left  to  the  reader. 

In  the  next  theorem  we  give  an  explicit  representation  for  the  ratio 

f(x,  ,.)/(f(x  )  in  terms  of  the  moments  of  z,  . 

'  k+1  '  k  k 

(2.18)  Theorem.  Fix  s>l  .  Let  x^  be  any  vector  in  the  optimum 

s-gradient  method,  and  let  be  the  moments  defined  by  (2.15)  for 

the  gradient  vector  z  =  Ax.  .  Then 
'  K.  K 


f(xk*l^ 

W 


^-i  ^“i  Vi 

^0  ^2  •••  ^s 


^s-l  Vl  ^2s-l 


*-l  M-1 


t 


where  M  ^  is  the  minor  determinant  of  p  ^  in  the  above  determinant: 
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^1  ^2 


U2  ^5  “  ’  ^s+1 

•  •  •  •  • 

^8  ^s+1  •**  ^2s-l 


Proof.  We  have  2f(x^)  -  xkTAxk  =  z^A  Lzk  =  n_L  •  To  simplify 

(ji  T 

the  notation,  let  z,r  =  (^,  •  ••»  £n)  anci  zk+i  =  ^n  ^ 

Then 


=  PT ^ 


ps(xi;  zk} 
(-1) SM_! 


^i» 


by  (2.1l) 


where  we  use  the  representation  (2.17)  for  Pg(t;  zk)  •  Then 


(2.19)  2f(Vl)  “  V^'Vl  =  jfz  £  [PA!  Zk))2  4  ^ 


-1 


-  75  L  w  zk> 

M_i  i-1 


W  Zk^  f2 

Xi  ^  ' 


Now  P  (t;  z  )  is  orthogonal  in  the  sense  of  (2.8)  to  all  polynomials 
s  k 

of  degree  <  s  -  1  .  Hence  the  only  term  of  Pg(Xi;  \)/x±  that  con" 
tributes  anything  nonzero  to  the  sum  (2.19)  is  the  term  (-l) 

Hence 


“V  ■  w  ^  ??/x; 


li 


>>4v4. u 


"  M  , 


.vs  n 

^  Z 

-1  i=l 


^0^1  "*  “s-1  ^i^Xi 

“l“2  —  “s  C! 


►  2  .  s-1 

^s+1  *••  “2s-l  Xi 


Izilf. 


“o  “l 
“l  ^2 


•  “s-1  “-1 


\x  |i 
s  n0 


l“s  ^s+1 


*  “2s-l  “s-ll 


Dividing  2f(x  .)  by  2f(x  )  =  \i  ^  and  rearranging  the  columns  of  the 
Kt  X  K  "X 

last  determinant  proves  theorem  (2.18) . 

(2.20)  Corollary.  In  the  notation  of  theorem  (2.18),  for  s  =  1, 


(2.21) 


f<xk+i)  V-1  -  “0 


“1--1 


If  n  =  2  and  s  =  1,  then 


(2.22) 


f(xk+l) 


c|c|(Xg  -  x,)2 

(X2^1+X1^2)(Xl^l+X2C2) 


2  2/  x 

=  c  =  c  (x,  )  . 


Proof.  The  second  expression  comes  from  the  first  by  using  (2 . 15) 

T 

and  (2.21),  where  =  (£  ,  £  )  ,  with  some  algebraic  manipulation. 


(2.23)  Corollary.  The  expression  (2.22)  for  f (xk+1)/f (x^)  is  unchanged 
if  (Cx,  C2)T  changed  to  (Cg,  -C1)T  . 
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i 


I 


The  inequality  (2.13)  yields  an  upper  bound  for  the  expression  in 
(2.18).  We  may  state  this  result  in  the  form  of  the  following  inequality, 
valid  for  s  =  1,  2,  . . . : 


(2.24) 


“-1  ^0  ^1 
^0  ‘‘l  U2 


s-1 


^s-1  ^s  ^s+1  ^S-U 


*-lx 


ll  • 


•  •  • 


Li  •  •  • 

s 


2s-l 


1 


■  "" 

: - 

x  +xn\ 

n  1| 

1 

s 

X  -X. 

n  l/J 

This  is  essentially  the  inequality  of  Meinardus  [Ba],  who  derived  it  by 
the  same  argument  for  a  slightly  different  iteration  in  which  ||x||2  is 
minimized  instead  of  f(x)  . 

The  special  case  for  s  =  1, 


(2.25) 


'X  +X1  , 
_n _ 11 

1|X  -X  . 
n  1J 


X  -X, 
n  1 


X  +X. 
n  1 


2 


L 

I 

I 

0 

II 


is  a  well-known  inequality  of  Kantorovich;  see  (8)  on  p.  410  of  [ 5 ]  • 

It  was  stated  by  Birman  [J]  that  the  bound  (2.l4)  is  sharp,  in  the 
sense  that  for  each  s  and  each  given  X^,  X^  (s  <  a),  one  can  find 
A  and  xQ  so  that  (2.14)  is  an  equality  for  all  k  .  This  is  done  by 
finding  a  set  of  X^  and  so  that  the  shifted  Chebyshev  polyno¬ 

mial  Q  (t)  is  (up  to  a  scalar  factor)  identical  with  P  (t;  z  )  and 

S  S  vJ 

so  that  |q  (X.) |  =  1  for  each  eigenvalue  X.  .  This  is  known  to  be 
s  1  1 

possible  because  the  Chebyshev  polynomials,  like  cosines,  are  orthogonal 
with  respect  to  summation  over  certain  points. 
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However,  Birman  did  not  investigate  the  actual  manner  or  rate  of 
convergence  of  f(x^)  to  0  in  the  optimum  s-gradient  method  for  a 
general  given  A  and  xQ  .  He  left  open  the  question  of  whether  the 
convergence  of  f(x  )  to  0  might  actually  be  faster  than  linear  in 

K 

certain  nontrivial  cases. 

For  s  =  1  Forsythe  and  Motzkin  [7]  conjectured  that  if 

^  0,  then  5-^  =  o(||x.||),  as  k  -» ®,  for  all  i  with  1  <  i  <  n  . 

In  words,  x,  -*  9  asymptotically  in  the  2-space  n.  spanned  by  the 
K.  j  n 

eigenvectors  belonging  to  X^  and  X^  .  The  conjecture  was  proved  by 

Forsythe  and  Motzkin  (unpublished)  only  for  n  =  3  .  Aka  ike  [l]  proved 

the  conjecture  for  arbitrary  n  .  In  an  unpublished  manuscript  Arms  [2] 

had  found  a  similar  proof.  We  give  a  proof  in  (4.12)  as  a  consequence 

of  our  result  (3.8)  for  the  s-gradient  method. 

Suppose  the  optimum  1-gradient  process  is  performed  entirely  in  the 

two-dimensional  space  n..  .  Then,  if  x  Pn.  and  x.  is  not  an 

l,n  ’  0  l,n  0 

eigenvector,  it  is  easy  to  prove  that: 

(i)  Xq,  Xg,  x^,  ...  are  all  collinear  vectors,  and  that 

x^,  x^,  x^,  . . .  are  also  collinear  in  another  direction.  Furthermore, 

2  2  2 
x2k+2  =  C  x2k  and  x2k+l  =  C  X2k-1'  for  a11  k  *  Here  C  is  given 

by  (2.22).  The  basic  reason  why  these  vectors  are  collinear  is  that 
the  gradients  zk+1  and  zk  must  always  be  perpendicular  in  any  optimum 
gradient  method. 

(ii)  Moreover,  for  each  k  =  0,  1,  ...,  =  c  •  This 

is  an  immediate  consequence  of  Corollary  (2.23).  Hence  f(x^)  N  0  in 
a  strictly  linear  fashion,  like  the  k-th  term  of  a  convergent  geometric 
series,  even  though  the  vectors  x  alternate  between  two  fixed  directions. 

K 


ik 


It  is  a  consequence  of  the  Forsythe-Motzkin-Arms-Akaike  result  on 


che  manner  of  convergence  of  x^  to  6  in  for  s  =  1  that  the 

iteration  behaves  asymptotically,  as  k  as  though  it  were  entirely 

in  the  two-space  n  .  The  vectors  x^  behave  ultimately  as  though 

I  they  had  resulted  from  an  iteration  started  with  some  x*  in  n, 

J  0  1,  n 

In  particular,  we  find  that  f(x^)  0  linearly,  in  the  sense  that 

lim  *^Xk+l^  =  c^(x*)  . 

|  -fi^r 

However,  the  vector  x*  is  an  extremely  complex  function  of  xQ  . 

*  Till  now,  the  asymptotic  nature  of  the  optimum  s-gradient  method 

j  has  not  been  described  for  s  >  1  .  This  problem,  posed  on  p.  }l4  of 

Forsythe  [6],  is  studied  in  the  next  section. 

I 


( 

( 

( 

( 

L 

L 

L 


3 .  Asymptotic  Behavior  of  the  s-gradient  Method. 

We  are  still  assuming  A  to  have  distinct  positive  eigenvalues 
\  <  X2  <  . . .  <  Xn  •  Fix  any  8  with  1  <  s  .  Motivated  by  (2.1l) 
and  by  Aka  ike's  approach  [l]  for  s  =  1,  we  shall  consider  the  trans* 
formation 


(3.1) 


w'  =  Pg(A;  w)  w  . 


Here  w  ^  G  and  P  (t;  w)  =  tS  +  . . .  is  the  orthogonal  polynomial 

s 

defined  in  (2.9).  Let 


(3-2) 


<P(w)  = 


Ikll2  IIp,(»;w) 


where  ||u|j  denotes  the  euclidean  length  of  u  . 

Similarly,  if  w'  f  6,  let  w"  =  P  (A;  w')  v',  so  that 

s 

Ml2 

qp(w')  =  - p  * 

|w'H 


The  following  theorem  is  of  basic  importance  to  our  analysis  of  the 
asymptotic  behavior  of  the  s-gradient  method. 

(3.3)  Theorem.  Let  I  be  the  angle  between  w  and  w"  .  For  any 
such  that  w"  ^  8,  we  have 


Ml2  2  ||w"||2  ||w"||2 

Cp(w)  =  - —  =  COS  *  - - 2  1 - 2  = 

w  w'  w' 
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and  there  is  equality  if  and  only  if  w"  =  cw  for  some  scalar  c  >  0  . 
Proof.  By  the  Cauchy-Schwarz  inequality  and  the  definition  of 

(3 .4)  (wTw")2  =  cos2\k||w||2  l|w"||2  <  ||w||2  ||w"||2, 

with  equality  if  and  only  if  w  =  cw",  for  c  ^  0  . 

Now 

||w'|r  -  wTw"  =  )|Ps (A;  w)  w||2  -  wTPs(A;  w')  Ps(A;  w)  w 
=  wT[P  (A;  w)]2  w  -  wTP  (A;  w')  P  (A;  w)  w 

S  So 

=  wTP  (A;  w)  (P  (A;  w)  -  P  (A;  w'))  w 

SS  o 

=  wTP  (A;  w)  D(A)  w 
s 

-  o, 


by  (2.3),  because  D(t)  is  a  polynomial  of  degree  at  most  s  -  1, 
since  the  leading  terms  tS  cancel.  Hence  ||w'||2  =  wTw",  whence 


(3.5) 


Combining  (3*4)  with  (3.5),  we  have 


with  equality  if  and  only  if  w"  =  cw  .  That  c  >  0  follows  from  the 

rp  O 

fact  that  w  w"  =  ||w'||  >  0  .  This  proves  theorem  (3*3)* 


1 
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(3.6)  Definition.  Fix  s  with  1  <  s  <  n  -  1  .  Fix  a  euclidean 

coordinate  system  in  En  so  A  takes  the  form  (2.7).  Let  £  be  the 
unit  sphere  in  En  .  Define  £*  c  L  to  consist  of  all  unit  vectors  y 
with  at  least  s  +  1  nonzero  components.  We  define  a  transformation 
T:  £*  -*  £*,  as  follows:  For  each  y  in  £*,  let  y'  =  Ty  =  w/||w||, 

where  w  -  P^A;  y)  y  .  (That  w  /  8  and  y'  €  £*  are  proved  in 

theorem  (5.1).) 

(3.7)  Definition.  By  a  continuum  we  mean  a  closed  connected  set  in  En, 
with  the  understanding  that  a  single  point  is  a  continuum. 

(3.8)  Theorem.  Fix  s  with  1  <  s  <  n  -  1  .  Let  yQ  =  (t^0),  . .., 

be  any  vector  in  £  with  tj  £  0  (i  =  1,  . . .,  n)  .  For  k  =  0,  1,  . . ., 

define  yk+1  =  Tyk,  where  T  was  defined  in  (3 .6)  .  Then  the  set  of 

limit  points  of  the  sequence  {y2k  :  k  =  0,  1,  2,  ...)  of  normalized 

* 

gradients  is  a_  continuum  R  c  L  .  Moreover,  for  any  point  r  in  R, 

g 

we  have  r  =  T  r  =  T(Tr)  . 

Proof.  Let  wQ  =  yQ  .  For  k  =  0,  1,  . ..,  let  wk+1  =  Pg(A;  yR)  wR, 
where  P  (t;  y)  was  defined  in  (2.9).  It  is  easily  shown  that  y.  = 

S  K 

w,/||w  ||,  for  all  k  .  Since  n  >  s  +  1  components  of  w  are  nonzero, 

K  K  “  U 

it  follows  from  theorem  ( 5  * l)  that  at  least  s  +  1  components  of  wfc 
are  nonzero  for  k  =  1,  2,  ...  .  Hence  no  wk  =  9  . 

Let  wk  =  ...,  .  By  theorem  (3*3)> 

<p(wQ)  <  <  ...  <9(wk)  <  ...  • 
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and  so 


But  for  each  k  the  s  zeros  of  P  (t;  w. )  lie  in  the  interval 

S  K. 

*n)  •  Hence  |Ps(t;  wR)  |  <  (\r  -  >^)s,  for  ^  <  t  < 


<p(vR) 


[CD 


00  i2 

i  1 


<  (\  -  \  )s  for  all  k  . 

—  n  r  ’ 


As  a  monotone  bounded  sequence, 


£<p  (w  ) }  has  a  limit  L 

IV 


Hence 


(3-9) 


<p(wk+i)  -  <p(wk)  -  0  (as  k  -  •  ) . 


But,  by  theorem  (3.3), 


(3.10) 


w. 


W  r.  2  .  , 

- - 12  [1  *  C0S  V' 

k*!1 


where  tk  is  the  angle  between  w^  and  w^+2  .  Then,  by  (3*9)> 

cos^tk  “*  and  fk  -*  0,  as  k  -»  •  .  (Since  c  >  0  in  (3.3),  tk  "I*  n  .) 

Now  consider  the  set  Y  of  unit  vectors  {y2k  :  k  =  0,  1,  2,  ...}  • 
As  an  infinite  subset  of  the  compact  unit  sphere  £,  {y2k)  has  ^-®it 
points;  let  R  be  the  set  of  all  limit  points  of  Y  .  Since  if.  —  0 , 

K 
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*«**•*«* *,  Afxim-, 


as  k  -*  *,  we  have  l|y2jc+2  “  y2k^  as  k  “*  *  •  Then,  as  Ostrovski 

shows  on  p.  203  of  [9],  the  set  R  must  be  a  continuum  in  the  sense  of 

(3.7). 

Let  r  be  any  point  of  R  .  Then  there  is  a  subsequence  {yc,  ) 

1 

converging  to  r  .  Since  ||y2k  +2  -  ygk  ||  “*  0,  we  have  also  that 
2 

y2k  +2  =  ^  ^2k  "*  r  *  T  is  a  continuous  transformation.  Hence 

2  i  2  ^  2  2 

T  y2k  -*  T  r,  and  T  r  =  r  .  Since  T  r  =  r,  we  see  from  theorem  (5»l) 

f  #  # 

that  r  €  £  .  Hence  RC  I  .  This  completes  the  proof  of  theorem  (3.8). 

The  author  has  programmed  a  number  of  test  cases  with  s  *=  2,  to 
investigate  the  nature  of  the  set  R  .  In  every  case,  R  appeared  to 
be  a  single  point.  The  author  conjectures  that  R  Is  always  a  single 
point  in  theorem  (3-8).  So  far,  this  has  been  proved  only  for  s  «=  1, 
and  we  give  the  proof  in  (4.12). 

The  following  theorem  shows  one  way  in  which  one  might  be  able  to 
prove  that  R  consists  always  of  a  single  point. 

(3.11)  Theorem.  Suppose  in  the  proof  of  theorem  (3-8)  that  <p(wk) 
were  to  converge  to  L  sc>  rapidly  that,  for  some  or  <  1. 

(3-12)  C<<p(wk+1)  -  <P(wk)  <«[<P(wk)  -  <P(wk_1)  £or  all  k  . 

Then  R  would  consist  of  a.  single  point. 

Proof.  If  (3*12)  held,  then  the  following  infinite  series  would  be 
convergent: 
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as  is  seen  from  (3*12),  by  the  ratio  test.  It  is  shown  in  (3*10)  that 


(3- 1*0  Mwk+l)  -  <p(wk)]^  ~  Sin  Uk|, 

where  ^  is  the  angle  between  the  vectors  w^  and 
(3.13)  and  (3.14),  we  would  have 

(3.15)  j£l*kl<-. 

Now,  let  =  wk/llwJI  the  u.  it  vector  in  the  direction  of  w^ 
would  follow  from  (3*15)  that 


as  k 

wk+2  '  Then^ 


k=0 


^y2k+2  '  y2k^ 


whence 

(3-16) 

would  be  an  absolutely  convergent  series  of  vectors.  Since 

k-1 

y2k  =  ^y2h+2  "  y2h^  +  y0' 

h=0 

we  see  that  the  sequence  {y2k}  would  then  have  one  limit  point, 
proves  the  theorem  (3 .11). 


Z  iyf 


k=0 


2k+2  '  y2k^ 


from 


.  It 


This 


However,  the  author  sees  no  way  to  prove  (j.12)  nor  the  conjecture. 
The  following  theorem  proves  that,  whether  R  has  one  point  or  an 
infinite  number,  f(xk)  "*  0  no  faster  than  linearly. 


(3.17)  Theorem.  Fix  s  with  1  <  s  <  n  -  1  .  Given  any  A  in  the 
form  (2.7).  Let  xq  =  (s|°^,  •••,  5^)T  be  any  vector  in  with 
m  nonzero  components.  Then  in  the  optimum  s-gradient  method  f(x  ) 
converges  to  0  in  the  following  ways : 

(i)  If  m  <  s,  then  x^  =  6,  f(x^)  =  0,  and  the  iteration  termi¬ 
nate  j  in  one  step. 

(ii)  If  s  +  1  <  m,  then  the  convergence  of  f(x  )  to  0  is 
asymptotically  linear,  in  the  sense  that  there  exist  constants  c^,  c 2 
depending  on  x^,  with 


(5.18) 


^X2k+2^ 

°  <  ci  <  )  <  c2  <  1,  for  all  k  . 

2k 


Proof,  We  may  ignore  any  zero  components  of  as  they  remain 

zero  throughout  the  iteration.  We  are  thus  minimizing  f(x)  in  . 

Proof  of  (i):  If  m  <  s,  then  the  subspace  L  defined  in  Sec.  2 

s 

is  E  .  Hence  xn  =  9  and  f(x..)  =  0,  the  minimum  of  f(x)  in  E  . 
m  1  V  7  '  '  m 

Proof  of  (ii):  That 


f  (X2k+2^ 

f(x2k)  ■ 


C2< 


follows  from  the  chain  of  inequalities  (2.13).  We  have  to  prove  the 
inequalities  involving  c^  . 
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Given  xQ  with  at  least  s  +  1  nonzero  components.  By  theorem 
(5.1)  all  other  vectors  x^  have  at  least  s  +  1  nonzero  components, 
so  that  no  =  9  .  By  theorem  (3.8),  the  normalized  gradient  vectors 
y2k  have  as  a  limit  set  a  continuum  R  .  For  each  point  r  in  R,  we 
have  T  r  =  r  .  Suppose  a  position  vector  x  were  such  that  r  =  Ax/||Ax||  €  R 
That  is,  x  would  be  in  the  direction  of  A_1r  .  Let  x"  be  the  result 
of  two  steps  of  the  optimum  s-gradient  method  applied  to  x  .  Since 
T  r  =  r,  we  see  that  x"  would  be  in  the  same  direction  as  x  .  Hence 

(3.19)  x"  =  yx  and  so  f(x")  =  72f(x), 

for  some  y  with  0  <  y  =  y( r)  <  1  . 

I.e.,  for  each  point  r  of  R  there  is  a  positive  real  number 
7(r)  such  that  whenever  the  gradient  of  a  vector  x  lies  in  the  direction 
of  r,  then  (3-19)  holds. 

Let  C  be  the  minimum  of  7(r)  for  r  €  R  .  Since  R  is  compact, 
the  minimum  is  assumed  and  C  >  0  .  Hence 

(3.2°)  °  <  C  -  f^T  , 

for  all  x  such  that  Ax/||Ax||  €  R  . 

Now  the  ratio  f(x")/f(x)  is  a  continuous  function  of  x  .  Let 
N(r)  c  Z  be  such  a  neighborhood  of  R  that 


for  all  x  with  Ax/||Ax||  in  N(R)  .  Consider  the  sequence  {x2k}  • 
Let  z2R  =  Ax2k,  and  let  y2R  =  z2k/Hz2JI  •  By  theorem  (3.8),  the 
{y2k)  have  R  as  a  limit  set.  Hence  there  is  a  K  such  that  for 
k  >  K,  all  y2k  lie  in  N(r)  .  By  (3.2l)  then  for  k  >  K  we  have 


i  2  „  (  2k+2 

&  <  f(x  ) 

^  2k; 


2 

Letting  c^^  =  4c  completes  the  proof  of  the  theorem. 

2 

Actually  we  could  have  taken  c^  =  C  -  e,  for  any  e  >  0 


(3.22)  Corollary.  With  the  hypotheses  of  theorem  (3. 17),  there  exist 
constants  d^,  d2  with 


0  <  d. 


f(xk+l} 

"  ffxk)  ~  d2  <  lf 


for  all  k  . 


Proof.  The  corollary  follows  from  theorem  (3.17),  the  inequalities 
(2.13),  and  the  fact  that  f(xk)  ^  0,  as  k  "*  *  . 


(3.23)  Theorem  .  Fix  s  >  1  .  Let  xQ  be  any  vector  such  that  xg 
is  parallel  to  x^  in  the  optimum  s-gradient  method.  In  other  words, 
Zq/H ZqII  iil  ill  the  set  F(A)  of  (4.5),  where  =  AxQ  .  Then 


(3.24) 


f^Xk+l^  2 

f^-=  C 


(k  -  0,  1,  2,  ...  ) , 


where  c  depends  on  A  and  on  xQ  . 

Remark.  The  import  of  this  theorem  is  that,  although  the  xk 


24 


»»< . 


alternate  between  two  fixed  directions,  as  k  the  ratio  (3.24)  is 

constant  for  all  k,  and  does  not  alternate. 


Proof  of  (3.23).  We  first  note  from  Corollary  (2.23)  that  the 

2 

theorem  is  true  for  s  =  1,  and  that  (2.22)  gives  a  formula  for  c  in 

terms  of  the  two  nonzero  components  C2  of  zQ  . 

For  any  fixed  s  >  1,  let  n  be  the  2-space  spanned  by  xQ  and 

1  T 

x,  .  Let  f  (x)  be  the  restriction  of  f(x)  =  fx  Ax  to  the  subspace 
in 

n  .  Then  the  vectors  xQ,  x^,  . . .  can  be  shown  by  a  geometrical 
argument  to  be  the  successive  iterates  of  the  optimum  1-gradient  method 
for  finding  the  minimum  of  f  (x)  in  n,  starting  with  x^  .  Then 
(3.24)  for  s  =  1  states  that 


~T7Z 


T 


2 

C  , 


for  some  constant  c  depending  on  the  eigenvalues  of  f^  .  Since 
f^(x)  =  f(x)  in  jt,  this  proves  the  theorem  for  s  . 

Presumably  theorem  (3*23)  could  somehow  be  proved  from  theorem 
(2.18),  just  as  the  case  s  =  1  follows  from  (2.22). 

Corollary  (3-22)  could  also  be  proved  from  theorem  (3 .23). 
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4.  Nature  of  the  Asymptotic  Directions. 


We  should  like  to  characterize  as  well  as  we  can  the  possible  limiting 

vectors  r  €  R  of  the  (normalized)  gradient  vectors  y2k  of  theorem 
2 

(3.8).  Since  T  r  =  r,  for  r  in  R,  we  have 


(4.1)  cr  =  P  (A;  Tr)  P  (A;  r) 

s  s 

-  «2S(A)  r’ 

where  c  >  0  is  a  constant  and  Q^^t)  is  the  product  of  the  two  poly- 

T 

nomials  P  (t;  Tr)  and  P  ( t ;  r)  .  Letting  r  =  (p. ,  . . .,  p  )  ,  we 
s  s  1  n 

have 


(4.2)  cpi  =  (i  =  1>  •••>  n)  . 

Recall  from  p.  44  of  [12]  that  P  (t;  Tr)  =  tS  +  ...  and  P  (t;  r) 

s  s 

g 

=  t  +  ...  are  polynomials  of  degree  s,  each  with  s  distinct  real 
zeros  in  the  open  interval  (X^,  X  )  .  Hence  Q,2g(t)  =  t  +  ...  is  a 
polynomial  of  degree  2s  with  2s  real  zeros  in  the  interval  (X^,  X^) , 
counting  double  zeros  twice,  if  any.  Now  c  >  0  in  (4.2),  which  implies 
that  for  each  i 


(4.3)  Qgg (X i )  =  c  >  0  or  p.  =  0 


(i  =  1,  ...,  n)  . 


Since  ^^(t)  vanishes  for  some  t  in  (X^  X^},  the  equation  Q^t) 

=  c  >  0  can  have  2,  3,  4,  ...,  or  2s  distinct  real  roots,  which  we 

call  |i .  (j  =  1,  ...,  m),  and  number  so  that 
J 
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II* 


^2  < 


<  H. 


m 


(Here  we  count  a  multiple  root  of  Q2s(t)  =  C  only  once.)  Thus 


- 0 


( j  =  1*  •  .  •  ;  m)  . 


By  (4.J)  each  X.  for  which  p.  f  0  is  one  of  the  p.  • 
^  J 


(4.4)  Definition.  Given  xQ  .  Let  R  be  the  set  of  limiting  points 
of  the  normalized  gradients  {y^  :  k  =  0,  1,  ...}  of  the  optimum 

r 

s-gradient  method  starting  from  xQ  .  For  any  vector  r  =  (p^,  ...,  pn) 
in  R,  let  S  be  the  set  of  for  which  ^  0  .  Any  such  set  is 

called  an  asymptotic  spectrum  of  the  optimum  s-gradient  method  for  the 
given  Xq  .  Any  r  in  R  is  called  an  asymptotic  gradient  vector  of 
the  same  iteration. 

Note  that  R  depends  on  A  and  xn,  and  we  occasionally  write 
R(Xq,  A)  to  make  the  dependence  explicit.  Note  that  S  is  a  property 
of  r  only,  and  only  indirectly  of  x^  . 


(4.5)  Definition.  For  a  given  A,  we  define  the  invariant  set  F(A) 
of  the  optimum  s-gradient  method  to  be  the  set  of  unit  vectors  r  such 
that  T2r  =  r  . 

We  have  shown  in  theorem  (3.8)  that,  for  any  x^,  R(x^,  A)  c  f(A)  . 
It  is  never  true  that  R(x^,  A)  =  F(A)  .  However,  it  is  true  that 

F(A)  =  U  R(x  ,  A)  . 

x  €  E 

0  n 
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so  that  r  =  R(r,  A)  . 


i 


For,  if  r  €  F(A),  then 


2 

T  r  = 


(4.6)  Theorem.  Given  x Q  =  (5^,  . ..,  S^)T  with  5^  /  0  (i  =  1, 
...,  n)  .  Assume  s  <  n  .  Then  both  eigenvalues  X^  and  X^  belong 
to  all  asymptotic  spectra  S  of  the  optimum  s-gradient  method  starting 
with  xQ  . 


Proof.  Assume  that  X^  (q<n)  is  the  largest  eigenvalue  in  the 

asymptotic  spectrum  S  corresponding  to  an  asymptotic  vector  r  of 

R(xq,  A)  .  The  zeros  of  each  Pg(t;  z^)  (k  =  0,  1,  ...  )  lie  in  the 

open  interval  (X,,  X  )  .  Hence  P  (X  ;  z, )  £  0  for  all  k  .  Hence 
^  '  1*  n  s  n  k  r 

q^21^  ^  0  for  all  k,  where  the  qj2k^  are  the  components  of 
y2k  =  Z2k^Z2k^  * 

Let  x  be  the  largest  zero  of  P  (t;  Tr)  P  (t;  r)  .  Since  the 

s  s 

zeros  of  both  P  (t;  Tr)  and  P  (t;  r)  lie  in  the  open  interval  X  ), 

S  S  a-  CJ 

we  see  that  P  (t;  Tr)  P  (t;  r)>*,  as  t*,  for  t  >  t  .  Hence 
s  s  * 


2 

c 


=  P  (X  ;  Tr)  P  (X 
s  q  s 


r)  <  P  (X  ;  Tr)  P  (K 


r)  . 


But  then,  by  continuity, 


Ps^q’  Z2k+1^  VV  Z2k^  -  °Ps^n;  Z2k+1^  Ps^n*  Z2k^ 


( 2k) 

for  some  a  <  1  and  all  k  >  K  .  Since  all  T)^  F  0,  and  since 

T|(2kj)  -  p  ^o,  for  a  certain  subsequence  k.,  this  means  that 

^  ^  (2k) 

as  j  -•  •  .  This  is  impossible,  since  all  y'  '  lie 


hi2kj)|  - 
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1 

M 


•  ?  ■*  *  i  •  • 

--i .v  \  i 


on  the  unit  sphere.  Hence  q  =  n,  snd  \  is  in  the  asymptotic  spectrum 
S  . 

The  proof  that  ^  is  in  S  is  analogous. 

(4.7)  Theorem.  Given  x^  with  fo  (i  =  1,  . ..,  n);  assume  that 

s  >  n  .  Let  m  be  the  number  of  eigenvalues  in  any  asymptotic  spectrum 
S  of  the  optimum  s-gradient  method .  Then 

s  +  1  <  m  <  2s  . 

Proof.  Let  r  €  R  be  an  asymptotic  gradient  vector  corresponding 

to  a  given  S  .  As  shown  after  (4 .3),  the  asymptotic  spectrum  S  is  a 

subset  of  the  set  of  t  for  which  P  (t;  Tr)  P  (t;  r)  =  c,  and  the 

s  s 

number  of  such  t  is  between  2  and  2s  . 

However,  if  m  <  s,  one  step  of  the  optimum  gradient  method  would 
carry  r  into  8,  and  so  r  could  not  belong  to  R  .  Hence 
s  +  1  <  m  <  2s  . 

(4.8)  Theorem.  Suppose  s  <  n  .  Let  ...,  §^)^  be  any 

vector  in  with  exactly  s  +  1  nonzero  components  .  Then 

Xq,  xq,  x^,  . . .  are  all  collinear  vectors .  That  is,  the  normalized 
gradient  vecoor  yQ  =  AxQ/J| Ax^U  is  iji  the  invariant  set  F(A)  of  (4.5)  . 

Proof.  Let  zQ  =  Ax^  .  It  will  suffice  to  prove  that  zg  =  cQz0, 
for  some  positive  constant  c^  .  Without  loss  of  generality  we  may 
assume  that  n  =  s  +  1,  since  the  components  for  which  =  0  remain 

zero. 
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By  (2.2) 


(4.9) 


Z1  m  2o  +  'iAzo  *  •••  +  V  V 


and  7  ,  . ..,  7g  are  so  chosen  that  z^  is  orthogonal  to  z^,  Az^,  . .., 

g «.  2_ 

A  z^  .  Because  s  +  1  components  of  zQ  are  nonzero,  the  s  vectors 

S  •  1  f 

z  ,  Azq,  . ..,  A  zq  are  linearly  independent.  Hence  the  set  (z^,  AzQ, 

...,  AS~1z(^}  forms  a  basis  for  the  subspace  of  Eg+^  orthogonal  to  z^  , 

Next,  z  is  formed  a";  a  linear  combination  of  z, ,  Az,,  .... 
'2  1  1 

s  s  •  I. 

A  z^  which  is  orthogonal  to  z^,  Az^,  ...,  A  z^  .  Since  z^  is 

orthogonal  to  z^,  it  is  expressible  in  terms  of  the  basis  z^,  ..., 

.  s-1 

A  zQ  : 


(1..10) 


Z2  '  VO  +  C1AZ0  +  +  ’ 


We  shall  prove  that  c^  *  Cg  =  . . .  =  cs  p  =  0  • 
Take  the  inner  product  of  (4.10)  with  Az^  : 

(4.n)  ZjV,  =  coZlTA20  +  Vita%  +  ...  +  C 

+  c 


T  S-1 

S-2Z1  A  Z0 


Tas 

s-lzl  A  Z0  • 


T  T 

But  z^  Az2  =  z2  Az^  =  0  because  is  orthogonal  to  Az^  .  And 

z^AZq  =  z^TA2z0  =  ...  =  z1^AS"1'z0  =  0,  because  z^  is  orthogonal  to 
2  s “1  T  s 

Azq,  A  zq,  ...,  A  zQ  .  And  z^  A  zQ  ^  0,  since  otherwise  by  (4 . ll) 
zn  would  be  9  .  It  then  follows  from  (4.11)  that  c  ..  =  0  . 

-L  S  -X 
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2 

Next,  taking  the  inner  product  of  (4.10)  with  A  z^  and  using  the 
same  argument  and  the  fact  that  cg  ^  =  0,  we  show  that  cg_2  =  0  . 

2  s-l 

After  taking  the  inner  product  of  (4.10)  with  Az^,  A  z^,  A  z^, 

we  will  have  proved  that  c  =  ...  =  c_  =  c,  =  0  .  Then,  from  (4,10), 

s“l  c  f 

Zg  c  c0zQ  .  That  cQ  >  0  follows  from  the  proof  of  (3  *3 )  •  This  com¬ 
pletes  the  proof  of  theorem  (4.8). 

Theorem  (4.8)  implies  that  any  s  +  1  eigenvalues  of  A  can  be  in 
the  asymptotic  spectrum  for  some  start  x^  .  Moreover,  any  vector  r 
with  exactly  s  +  1  nonzero  components  can  be  an  asymptotic  gradient 
vector  of  an  iteration.  This  extends  to  s  >  2  the  known  fact  for  the 
ordinary  optimum  1-gradient  method  in  2  dimensions  that  any  initial 
gradient  direction  is  repeated  at  every  other  step  of  the  iteration. 

See  the  end  of  Sec.  2  above,  or  p.  214  of  Ostrowski  [9]. 

That  for  all  s  the  period  of  the  iteration  in  theorems  (5.8)  and 
(4.8)  is  2,  and  not  higher  than  2,  was  a  surprising  fact  to  the  author. 
However,  the  experiments  of  Khabaza  [8]  for  s  =  3  suggest  the  period  2. 

For  s  =  1  we  have  s  +  1  =  2s  =  2,  and  then  by  theorem  (4.7)  all 
the  vectors  invariant  under  two  steps  of  the  optimum  1-gradient  method 
are  of  the  type  covered  in  theorem  (4.8).  From  this  we  can  now  show  for 
s  =  1  that  the  limiting  set  R  of  theorem  (3*8)  is  actually  a  single 
point.  The  following  is  a  modification  of  Aka  ike's  proof  in  [l]  of  the 
Forsythe-Motzkin  conjecture  [73- 


(4.12)  Theorem  (Akaike).  Let  s  =  1  .  Let  yQ  =  (^°\  • ^°^)T 


'n 


be  any  vector  in  £  with  t)|°^  ^  0  (i  =  1,  . . .,  n)  .  Then  the  sequence 
{y  :  k  =  0,  1,  ... }  of  normalized  gradients  converges  to  £  single  point 

cK  -  ■**  '  "  ' 

r  whose  spectrum  is  (4^,  4^}  .  Moreover,  T  r  =  r  . 
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Ill 


Proof.  By  theorem  (3.8)  the  set  of  unit  vectors  :  k  =  0,  1> 

has  a  continuum  R  as  a  limit  set.  By  theorem  (4.7),  for  any  r  €  R  the 

corresponding  spectrum  S  of  r  has  only  2  eigenvalues  in  it  (for  s  +  1 

=  2s  =  2  ).  Now  by  theorem  (4.6)  the  two  eigenvalues  in  S  must  be  ^ 

T 

and  .  let  r  be  any  point  of  R;  let  r  =  (p^,  0,  . ..,  0,  pr)  , 

2  2  2  2 
with  p^  +  p^  =  1  .  Then  P^(t;  r)  =  t  -  \i,  where  ^  =  k^p^  +  ^nPn  • 

Hence  P.  (A;  r)  r  =  (( K  -  mOp.,  0,  ...,  0,  (\  -  p.)p  )T  . 
i  ii  n  n 

By  the  proof  of  theorem  (3*8), 

L  =  ^  <P(wk)  =  q>(r)  =  ||P1(A;  r)  r||2,  since  ||r||2  =  1 

=  (^  -  ^)2p2  +  (\n  -  ^)2  p2, 


or 

(4.13)  4  ■  •  v2  4  4  ■ 

Now  L  is  a  number  determined  by  the  iteration,  and  \  are 

'In 

2  2  2  2 
given  eigenvalues,  and  p^  +  p^  =  1  .  Hence  the  pair  p^,  pfi  are 

determined  by  (4.13),  up  to  an  interchange  at  most.  Hence  the  set  R 

can  have  at  most  eight  vectors  in  it,  if  all  permutations  of  signs  are 

considered.  But  then,  since  R  is  a  continuum,  it  must  consist  of  a 

single  point,  which  we  call  r  .  Then  y^k  r,  as  k  *  .  This 

proves  theorem  (4.12). 

T  T 

Actually,  if  r  =  (p^  ...»  pfi)  ,  then  Tr  =  (p  ,  ...,  -p^  , 

where  all  components  p^  =  0  for  1  <  i  <  n  .  Then  r  =  lim  y^  an<* 

Tr  =  lim  y2k+p»  as  k  -  «•  .  So,  the  directions  of  the  gradient  vectors 

alternately  approach  the  directions  of  r  and  Tr,  as  k  -*  ®  . 
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The  reason  we  cannot  extend  our  proof  of  theorem  (4.12)  to  s  >  1 

is  that  the  equation  analogous  to  (4.13)  involves  between  s  +  1  and 

2s  unknown  components  of  r,  and  we  do  not  see  how  to  limit  r  to  a 

finite  number  of  vectors.  Even  for  s  =  2,  theorem  (4.8)  shows  that 

2 

all  vectors  r  with  3  nonzero  components  are  invariant  under  T 
Prescribing  the  vector  r  to  have  unit  length  and  prescribing  the  value 
of  L  reduce  the  number  of  free  parameters  in  r  to  1  .  But,  so  far 
as  the  author  can  see,  there  remain  possible  limiting  vectors  r 

in  R  . 

Moreover,  for  an  even  number  s  >  ±,  there  are  asymptotic  spectra 

containing  more  than  s  +  1  eigenvalues,  as  will  now  be  demonstrated. 

We  shall  consider  only  spectra  with  symmetry  about  a  midpoint.  We  dc 

not  know  whether  there  are  asymptotic  spectra  with  more  than  s  +  1 

eigenvalues  without  such  a  symmetry. 

We  shall  first  examine  possible  asymptotic  spectra  with  an  even 

number  2q  of  eigenvalues.  Let  the  eigenvalues  in  S  be  a  -  p.^, 

a  -  u,  , .  ...,  a  -  u. ,  a+un,  ....  a  +  p,  , ,  a  +  p.  ,  where  0  <  a  -  p. 
q-1*  *  1  1'  ’  q-1'  q  q 

and  0  <  Pi  <  . . .  <  p,  .  Let  us  consider  unit  vectors  r  with  symmetric 

1  q 

components  p^,  ...,  p^,  p^,  ...,  p  ,  corresponding  to  the  respective 
points  of  the  spectrum. 

Because  of  the  symmetry  about  the  point  t  =  a,  the  orthogonal 
polynomials  ^k^’  *  P2k+1^’  associated  with  S  and  the  {p^} 

satisfy  the  conditions 

(4.14)  P2k^t;  =  "  a^' 
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where  is  a  monic  polynomial  of  degree  k; 


(4.15) 


P2k+l^t;  r)  =  (4  "  a)  ^((4  “  a)  )> 


where  h,  is  a  monic  polynomial  of  degree  k  . 

By  symmetry,  the  even  and  odd  polynomials  P^(t;  r)  are  automatically 
orthogonal.  By  (4.l4)  orthogonality  of  the  P  (t;  r)  among  themselves 
can  be  expressed  in  the  form 


(4.16)  £  g-j^i.)  6k0*f)  pi  =  0  (j,  k  =  0,  1,  ... ;  J  f  ;■;)  . 


Thus  the  gk(t)  are  themselves  orthogonal  polynomials  over  the  set 

2  2  22 

p.  ,  . ..,  p  with  the  weight  factors  p  ,  . ..,  p  .  Moreover,  g  (t)  = 

X  Q  X  q  K 

k  2 

(-1)  g  (a  -  t)  are  monic  orthogonal  polynomials  over  the  transformed 

K 

2  2  2  2  2  2 
set  §  =  {a  -  p^,  ...,  a  -  p^}  with  the  same  weights  p^,  ...,  p^  . 

Note  that  |gk(o) |  =  lp2k(0;  r) |  and  that  |gk(a2  -  p2)  |  =  lp2k(a  ±  r) 

for  i  =  1,  ...,  q  .  Hence  I  g,(t)/g,  (o)  |  has  the  same  constant  value 

K  K. 

over  the  set  S  that  lp2k(t’  r)/P2k(0;  r)  |  has  over  the  set  S  . 

By  (4.15)  the  orthogonality  of  the  p2k+1  among  themselves  can  be 
expressed  as 


(4.17)  f,  h  (p2)  \(p2)  p2  p2  =  0 


(j,  k  -  0,  1,  •  •  •  >  j  ^  k) 


Thus  the  hk(t)  =  (-l)Sik(a2  -  t)  are  monic  orthogonal  polynomials  over 

2  2  2  2  2  2 
the  set  §  =  {a  -  p^,  ...,  a  -  p^  with  the  different  weights  p^  p^,  ..., 

p^  p2  .  Note  that  |hk(o)|  =  i hk(a2)  |  =  lp2k+1(°;  r)|/a,  and  that 


4* 


|t.k(02  -  H2)|  =  l\(n2)|  -  lp2k+1(a  tup  i-il/Hi  • 


Thus  constancy  of  | P0,  , n ( ^ ;  r) |  over  S  does  not  imply  constancy  of 
|Fik(t)  |  over  §  .  The  even  and  odd  polynomials  transform  differently. 

By  means  of  these  orthogonal  polynomials  we  can  reduce  the 

problem  of  the  invariance  of  the  r  under  two  steps  of  the  optimum 
2s~gradient  method  over  S  to  the  problem  of  the  invariance  of  an 
optimum  s-gradient  method  over  §  in  a  space  of  half  the  dimension. 

To  be  precise,  the  above  relations  imply  the  following  result, 
which  we  do  not  prove. 

(4.18)  Theorem.  If  s  is:  even  and  s  +  1  <  2q  <  2s,  then  the  vector 

T 

r  =  (p  ,  ...,  pn ,  p,,  ...,  p  )  (with  no  p.  =  0  )  is  in  the  invariant 

q  1  1  q  - 1 - 

set  (4 . 5)  for  the  optimum  s-gradient  method  for  the  diagonal  matrix  of 
2q  nonzero  elements 


diag(a  -  |*  ,  ...,  a  -  M>1,  a  +  ...,  a  +  p.q) 

A  T 

if  and  only  if  the  vector  r  =  (p^,  . . .,  p^)  (with  no  p^  =  0  )  is_ 
in  the  invariant  set  for  the  optimum  (s/2) -gradient  method  for  the 
diagonal  matrix  of  q  nonzero  elements 

diag(a  +  p,1,  . .  .,  a  +  |*  )  . 


Moreover,  when  iterations  exist  with  these  invariance  properties,  if 
zQ  =  r/!| r||  and  zq  =  r/f| r|J ,  then  HzJI  =  \\z.J\  for  k  =  0,  1,  2,  ..., 
where  z,  and  z,  are  the  gradient  vectors  of  the  respective  iterations. 

■"  K  K.  1  '  ————— 


We  do  not  know  a  comparable  theorem  for  odd  integers  s  . 

As  an  application  of  theorem  (4.l8),  we  can  show  that  for  any  s 

of  the  form  s  =  2P  (p  =  0,  1,  2,  . . .) ,  Lnere  exist  vectors  with  2s 

nonzero  components  that  ara  in  the  invariant  set  of  some  optimum  s-grsdi- 

ent  method.  For  p  =  0  this  is  theorem  (4.12),  and  is  true  for  any 

2  2  2  2 

diagonal  matrix  of  two  positive  elements  diag(a  +  p^,  a  +  p2)  and 

T 

any  vector  r  =  (p-^,  p^)  .  Application  of  the  first  sentence  of  (4 . 18) 

2  2  2  2  2  2 

leads  to  s  =  2  with  any  matrix  of  form  diag(b  +  b  +  Vg,  b  +  v^, 

2  2  2  2  2  2  2 
b2  +  VlP  where  b  +  =  a  -  n2,  b  +  \>2  =  a  -  p^  b  +  =  a  +  pL, 

2  2  T 

b  +  =  a  +  p^,  and  corresponding  vector  c(p2,  Pj,  pg)  •  Another 

application  of  (4.18)  leads  to  s  =  4  with  the  matrix 

diag(b  -  v^,  . . .,  b  -  b  +  v  ,  . b  +  v^) 

T 

and  corresponding  vector  c'(pg,  p^  p^,  Pg,  Pg,  p^  Pg)  .  It 

is  clear  that  the  process  may  be  continued  to  s  =  2P  for  any  p  . 

Note  from  theorem  (4.7)  that  2s  is  the  maximal  number  of  nonzero 
components  in  any  vector  in  the  invariant  set  for  an  optimum  s-gradient 
method.  Our  above  example  illustrates  the  maximal  case. 

We  next  consider  symmetric  asymptotic  spectra  with  an  odd  number 
2q  +  1  of  eigenvalues  a  -  p  ,  . ..,  a  -  p^,  a,  a  +  p^,  . ..,  a  +  p^  and 
a  corresponding  symmetric  vector 

T 

(p  l  •  P->  )  P  >  P-,  }  •••>  P  )  y 

q  1  O  1  q 

P 

invariant  under  T"  .  Then  again  the  orthogonal  polynomials  take  the 


forms  (4.14)  and  (4.15).  The  odd  polynomials  ore  still  defined  by  the 
condition  (4.17),  but  the  condition  (4.16)  must  be  replaced  by 

(4.19)  2  f,  gj(^)  gk(^)  p[  +  g.(0)  gk(0)  o2  =  0 

(j,  k  =  Of  1,  •  •  • }  j  /  k)  . 

The  analog  of  theorem  (4.18)  is  now  stated,  but  not  proved: 

(4.20)  Theorem.  If  s  is.  even  and  s  +  l<2q+l<  2s,  then  the 
vector  r  =  (p^,  ...,  p.^,  pQ,  p^,  ...,  p  )  (with  no  p^  =  0  )  is  in 
the  invariant  set  (4.5)  for  the  optimum  s -gradient  method  for  the  diag¬ 
onal  Matrix  of  2q  +  1  nonzero  elements 

diag(a  -  u  ,  ...,  a  -  p  ,  a,  a  +  p  ,  ...,  a  +  p  ) 

H  A  4 

if  and  only  if  the  vector  r  =  (pQ//2,  p^,  ...,  p^)T  (with  no  Pi  =  0  ) 
is  in  the  invariant  set  for  the  optimum  (s/2) -gradient  method  for  the 
diagonal  matrix  of  q  +  1  elements 

A'  t  2  2  A.  2  2  A.  2\ 

diag(a  ,  a  +  p.^  . . .,  a  +  p  )  . 

Moreover,  when  iterations  exist  with  these  invariance  properties,  if 
zQ  =  r/UrJI  and  zQ  =  r/|| r|| ,  then  || zR||  =  HzJI  for  k  =  0,  1,  2,  ..., 
where  z^  and  z^  are  the  gradient  vectors  of  the  respective  iterations . 

If  s  is  odd,  then  the  set  of  2q  +  1  eigenvalues  {a  -  p^,  . . . , 
a  -  p^,  a,  a  +  p^ ,  . . . ,  a  +  p^)  can  never  be  the  asymptotic  spectrum 
of  an  optimum  s-gradient  iteration. 
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_ r  tbik. _ 


The  first  two  sentences  are  strict  analogs  of  theorem  (4.l8).  The 


third  is  true  because  P_.  ,.(a;  r)  -  0  for  all  k  . 

2k+l 

The  signs  of  the  are  of  no  importance  in  theorems 

(4.20),  and  any  could  be  left  alone  or  replaced  by  -p^ 


(4.18)  and 
independently 


at  any  place  it  is  mentioned. 


*■ 


i 

I 

a 

o 


i 

-  * 


5*  Singular  and  Derogatory  Quadratic  Forms;  Zero  Components. 

Two  restrictions  placed  on  A  above  are  really  irrelevant--that  A 

be  regular  and  nonderogatory.  If  A  is  singular,  then  for  some  p  >  1, 

we  have  k  =0<\.1<...<  \  .  Then  it  follows  from 

1  p  p+1  n 

(2.12)  that 


(k+1)  ,(k) 

i  ~  > 


for  1  <  i  <  p;  k  =  0,  1,  2,  . .., 


(k) 

while  all  components  g .  0,  as  k  -*  ®,  for  p  +  1  <  i  <  n  .  On 

T  li,  Q  C 

the  other  hand  f(x)  =  ■fex  Ax  =  £  k.  5.  8  £  V  gf  .  Thus  f(x)  is 

i=l  1  1  i«pH  1  1 


minimized  for  all  vectors  in  the  subspace  N  where  =  . . .  =  5  =  0 
and  the  gradient  methods  proceed  from  xQ  to  the  closest  point  x^ 
of  N,  with  all  x^  -  x—  and  all  gradients  z^  located  in  the  ortho¬ 
gonal  complement  of  N  . 

If  A  is  derogatory,  it  has  multiple  eigenvalues  but  a  complete 

set  of  eigenvectors  (because  A  is  symmetric) .  Suppose,  for  example, 

that  0  <  V  =  =  . . .  =  X  <  X  \  ,  and  suppose  that 

12  r  r+1  n* 


f 


x 


0 


.(0) 

’rfl' 


•  •  *  f 


(oKt 

n  >  * 


Now  the  orthogonal  basis  of  eigenvectors  belonging  to  ...,  is 

not  uniquely  defined.  Our  preceding  analysis  required  at  various  places 
(e.g.,  in  the  proof  of  (4.8)  that  the  4^  be  distinct  for  each  nonzero 
component  gf^,  but  zero  components  gj^  were  ignored.  If  any  of 
§2^,  ...,  gj^  are  nonzero,  make  an  orthogonal  transformation  of  the 
eigenvector  basis  so  that  x^  takes  the  form 
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.«*  » i*>v  i^tri  I 


=  ((e<0)2  +  ...m<0)8>*,  o,  ....  o,  ,W.  ....  i<0))T. 


Then  drop  the  new  zero  components  . ..,  |  entirely,  and  effectively 

reduce  A  to  a  nonderogatory  matrix  A  of  order  n  -  r  +  1  . 

Thus,  in  effect,  only  the  set  and  number  of  distinct  nonzero  eigen¬ 
values  of  A  have  a  real  relevance  to  the  gradient  methods  for  quadratic 
T 

functions  Ax  . 

Moreover,  zero  components  of  any  x  should  be  ignored,  and  the 

K 

(k) 

order  of  A  reduced  by  unity  for  each  zero  component  | |  '  that  occurs 
at  any  stage  of  the  iteration. 

If  fewer  then  s  +  1  components  of  any  x^  are  nonzero,  then 

x^+1  =  6  and  the  iteration  terminates  8t  once.  Hence  we  have  always 

insisted  that  at  least  s  +  1  components  of  xQ  be  nonzero.  Even  so, 

one  may  ask,  might  not  enough  P  (X.  ;  z,  )  be  "accidentally"  zero,  so 

si  n 

that  for  some  later  x^  fewer  than  s  +  1  components  are  nonzero? 

The  answer  is  negative,  as  the  following  theorem  shows: 


f  If  I 

(5«l)  Theorem.  Assume  s  +  1  <  n  .  Assume  g)  '  0  for  i  =  1,  ..., 


n  .  Then  at  least  s  +  1  components  5 


(k+1) 


Proof.  By  (2.12),  =  p  (\. ;  z  )  g(*^,  up  to  a  multiplicative 

1  S  1  K  1 

constant  that  does  not  matter,  where  P  (t;  z,  )  is  the  orthogonal  poly- 

S  K  _  2 

nomial  of  degree  s  over  the  set  {X^,  ...,  X^}  with  weights  [c^l  • 

We  shall  prove  that  there  exist  s  +  1  eigenvalues  out  of  the  X^  : 


(5-2) 


.  <  X' 


1 


such  that  P  (k* ;  z  )  P  (\'  ;  z  )  <  0  for  i  =  1,  ...,  s  .  A  fortiori, 

Si  1C  S  1 »  i.  K 

P  (k. ;  z.)  ^  0  for  i  =  1,  2,  s  +  1,  and  the  theorem  will  have 

SI  K 

been  proved. 

If  the  above  sign-alternation  property  is  false,  then  let  q  <  s 
be  the  largest  integer  such  that  we  can  find  with 

(5.3)  Ps(^5  zk)  ps(^.  +  1;  \)  <  0  i  =  1,  q  -  1  • 


(Clearly  some  q  >  2  exists,  or  else  P  (k. ;  z  )  would  always  be  of 

one  sign  and  hence  P  could  not  be  orthogonal  to  P  =  1  .  Then 

s  u 

pick  ...,  uq-1  with 


S.  *=  “l  <  <  “2  <  •••  <  X'a-1  <  Vl  <  K’ 


so  that,  if  Q(t)  a  (t  -  ^)  . . .  (t  -  H^),  then  ps(\;  zk)  Q(\)  >  0 
for  all  i  =  1,  ...,  n  .  (We  omit  details  of  the  construction.)  Then 


(P  (t;  zj,  Q(t)>  =  Z  p  (\J 


i=l 


ti 


(k) 


t2 


>  o, 


so  that  P  and  Q  are  not  orthogonal.  But,  since  Q  is  of  degree 
s 

q  -  1  <  s  -  1,  P  must  be  orthogonal  to  Q  .  This  contradiction 
"*  s 

completes  the  proof  of  theorem  (5«l). 


Ul 
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